normality test
Statistical Analysis of Sentence Structures through ASCII, Lexical Alignment and PCA
While utilizing syntactic tools such as parts-of-speech (POS) tagging has helped us understand sentence structures and their distribution across diverse corpora, it is quite complex and poses a challenge in natural language processing (NLP). This study focuses on understanding sentence structure balance - usages of nouns, verbs, determiners, etc - harmoniously without relying on such tools. It proposes a novel statistical method that uses American Standard Code for Information Interchange (ASCII) codes to represent text of 11 text corpora from various sources and their lexical category alignment after using their compressed versions through PCA, and analyzes the results through histograms and normality tests such as Shapiro-Wilk and Anderson-Darling Tests. By focusing on ASCII codes, this approach simplifies text processing, although not replacing any syntactic tools but complementing them by offering it as a resource-efficient tool for assessing text balance. The story generated by Grok shows near normality indicating balanced sentence structures in LLM outputs, whereas 4 out of the remaining 10 pass the normality tests. Further research could explore potential applications in text quality evaluation and style analysis with syntactic integration for more broader tasks.
Visually Evaluating Generative Adversarial Networks Using Itself under Multivariate Time Series
Visually evaluating the goodness of generated Multivariate Time Series (MTS) are difficult to implement, especially in the case that the generative model is Generative Adversarial Networks (GANs). We present a general framework named Gaussian GANs to visually evaluate GANs using itself under the MTS generation task. Firstly, we attempt to find the transformation function in the multivariate Kolmogorov Smirnov (MKS) test by explicitly reconstructing the architecture of GANs. Secondly, we conduct the normality test of transformed MST where the Gaussian GANs serves as the transformation function in the MKS test. In order to simplify the normality test, an efficient visualization is proposed using the chi square distribution. In the experiment, we use the UniMiB dataset and provide empirical evidence showing that the normality test using Gaussian GANs and chi sqaure visualization is effective and credible.
- North America > United States > Washington > King County > Seattle (0.04)
- Asia (0.04)
Box-Cox Transformation for Normalizing a Non-normal Variable in R - Universe of Data Science
Box-Cox transformation is commonly used remedy when the normality is not met. This comherensive guide includes estimation techniques and use of Box-Cox transformation in practice. Find out how to apply Box-Cox transformation in R. In this tutorial, we will work on Box-Cox transformation in R. Firstly, we will mention two types of estimation techniques for Box-Cox transformation parameter. These are maximum likelihood estimation (MLE) and estimation via normality tests. Secondly, we will work how to apply Box-Cox transformation in practice.
- North America > United States > New York (0.06)
- North America > United States > California > Ventura County > Thousand Oaks (0.06)
Testing for Normality with Neural Networks
In this paper, we treat the problem of testing for normality as a binary classification problem and construct a feedforward neural network that can successfully detect normal distributions by inspecting small samples from them. The numerical experiments conducted on small samples with no more than 100 elements indicated that the neural network which we trained was more accurate and far more powerful than the most frequently used and most powerful standard tests of normality: Shapiro-Wilk, Anderson-Darling, Lilliefors and Jarque-Berra, as well as the kernel tests of goodness-of-fit. The neural network had the AUROC score of almost 1, which corresponds to the perfect binary classifier. Additionally, the network's accuracy was higher than 96% on a set of larger samples with 250-1000 elements. Since the normality of data is an assumption of numerous techniques for analysis and inference, the neural network constructed in this study has a very high potential for use in everyday practice of statistics, data analysis and machine learning in both science and industry.
- North America > United States > New York > New York County > New York City (0.14)
- Europe > Serbia > Central Serbia > Belgrade (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (8 more...)
- Research Report > New Finding (0.48)
- Research Report > Experimental Study (0.46)
Sliced generative models
Knop, Szymon, Mazur, Marcin, Tabor, Jacek, Podolak, Igor, Spurek, Przemysław
In this paper we discuss a class of AutoEncoder based generative models based on one dimensional sliced approach. The idea is based on the reduction of the discrimination between samples to one-dimensional case. Our experiments show that methods can be divided into two groups. First consists of methods which are a modification of standard normality tests, while the second is based on classical distances between samples. It turns out that both groups are correct generative models, but the second one gives a slightly faster decrease rate of Fr\'{e}chet Inception Distance (FID).
Cardiology Admissions from Catheterization Laboratory: Time Series Forecasting
Choudhury, Avishek, Perumalla, Sunanda
Emergent and unscheduled cardiology admissions from cardiac catheterization laboratory add complexity to the management of Cardiology and in-patient department. In this article, we sought to study the behavior of cardiology admissions from Catheterization laboratory using time series models. Our research involves retrospective cardiology admission data from March 1, 2012, to November 3, 2016, retrieved from a hospital in Iowa. Autoregressive integrated moving average (ARIMA), Holts method, mean method, na\"ive method, seasonal na\"ive, exponential smoothing, and drift method were implemented to forecast weekly cardiology admissions from Catheterization laboratory. ARIMA (2,0,2) (1,1,1) was selected as the best fit model with the minimum sum of error, Akaike information criterion and Schwartz Bayesian criterion. The model failed to reject the null hypothesis of stationarity, it lacked the evidence of independence, and rejected the null hypothesis of normality. The implication of this study will not only improve catheterization laboratory staff schedule, advocate efficient use of imaging equipment and inpatient telemetry beds but also equip management to proactively tackle inpatient overcrowding, plan for physical capacity expansion and so forth.
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.50)
- North America > United States > Iowa (0.25)
- Oceania > Australia (0.04)
- (4 more...)
How to Transform Data to Better Fit The Normal Distribution
A large portion of the field of statistics is concerned with methods that assume a Gaussian distribution: the familiar bell curve. If your data has a Gaussian distribution, the parametric methods are powerful and well understood. This gives some incentive to use them if possible. Even if your data does not have a Gaussian distribution. It is possible that your data does not look Gaussian or fails a normality test, but can be transformed to make it fit a Gaussian distribution.
You have created your first Linear Regression Model. Have you validated the assumptions?
With the dawn of the age of Data Science, there is an increased interest in learning and applying algorithms, not just by business analysts or data scientists, but by several other professionals whose core job may not be crunching data or building models. Good sign, indeed, if one understands the when, why and how of applying these fantastic techniques. If your scatterplot shows curvilinear relationship, keep in mind that higher order polynomials (2 or above) may do a better job at modelling the data. Compare models, statistics and decide for yourself which model best explains your data. For validity of a LR model, the VIF (Variance Inflationary Factor) should not be too high. How high is too high?
Clonal analysis of newborn hippocampal dentate granule cell proliferation and development in temporal lobe epilepsy
Singh, Shatrunjai P., LaSarge, Candi L., An, Amen, McAuliffe, John J., Danzer, Steve C.
Hippocampal dentate granule cells are among the few neuronal cell types generated throughout adult life in mammals. In the normal brain, new granule cells are generated from progenitors in the subgranular zone and integrate in a typical fashion. During the development of epilepsy, granule cell integration is profoundly altered. The new cells migrate to ectopic locations and develop misoriented basal dendrites. Although it has been established that these abnormal cells are newly generated, it is not known whether they arise ubiquitously throughout the progenitor cell pool or are derived from a smaller number of bad actor progenitors. To explore this question, we conducted a clonal analysis study in mice expressing the Brainbow fluorescent protein reporter construct in dentate granule cell progenitors. Mice were examined 2 months after pilocarpine-induced status epilepticus, a treatment that leads to the development of epilepsy. Brain sections were rendered translucent so that entire hippocampi could be reconstructed and all fluorescently labeled cells identified. Our findings reveal that a small number of progenitors produce the majority of ectopic cells following status epilepticus, indicating that either the affected progenitors or their local microenvironments have become pathological. By contrast, granule cells with basal dendrites were equally distributed among clonal groups. This indicates that these progenitors can produce normal cells and suggests that global factors sporadically disrupt the dendritic development of some new cells. Together, these findings strongly predict that distinct mechanisms regulate different aspects
- North America > United States > Ohio > Hamilton County > Cincinnati (0.14)
- North America > United States > New York (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > Switzerland > Vaud > Lausanne (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
An Iterative BP-CNN Architecture for Channel Decoding
Liang, Fei, Shen, Cong, Wu, Feng
Inspired by recent advances in deep learning, we propose a novel iterative BP-CNN architecture for channel decoding under correlated noise. This architecture concatenates a trained convolutional neural network (CNN) with a standard belief-propagation (BP) decoder. The standard BP decoder is used to estimate the coded bits, followed by a CNN to remove the estimation errors of the BP decoder and obtain a more accurate estimation of the channel noise. Iterating between BP and CNN will gradually improve the decoding SNR and hence result in better decoding performance. To train a well-behaved CNN model, we define a new loss function which involves not only the accuracy of the noise estimation but also the normality test for the estimation errors, i.e., to measure how likely the estimation errors follow a Gaussian distribution. The introduction of the normality test to the CNN training shapes the residual noise distribution and further reduces the BER of the iterative decoding, compared to using the standard quadratic loss function. We carry out extensive experiments to analyze and verify the proposed framework. The iterative BP-CNN decoder has better BER performance with lower complexity, is suitable for parallel implementation, does not rely on any specific channel model or encoding method, and is robust against training mismatches. All of these features make it a good candidate for decoding modern channel codes.